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Abstract 

This paper is a survey paper on stochastic epidemic models. A simple stochas- 
tic epidemic model is defined and exact and asymptotic model properties (relying 
on a large community) are presented. The purpose of modelling is illustrated by 
studying effects of vaccination and also in terms of inference procedures for impor- 
tant parameters, such as the basic reproduction number and the critical vaccination 
coverage. Several generalizations towards realism, e.g. multitype and household 
epidemic models, are also presented, as is a model for endemic diseases. 

Keywords: Basic reproduction number, critical vaccination coverage, household epi- 
demic, multitype epidemic, stochastic epidemic, threshold theorem. 



1 Introduction 

Early modelling contributions for infectious disease spread were often for specific diseases. 
For example Bernoulli (1760) aimed at evaluating the effectiveness a certain technique of 
variolation against smallpox, and Ross (1911) modelled the transmission of malaria. One 
of the first more general and rigorous study was made by Kermack and McKendrick 
(1927). Later important contributions were for example by Bartlett (1949) and Kendall 
(1956), both also considering stochastic models. 

Early models were often deterministic and the type of questions that were adressed were 
for example: Is it possible that there is a big outbreak infecting a positive fraction of the 
community?, How many will get infected if the epidemic takes off?, What are the effects 
of vaccinating a given community fraction prior to the arrival of the disease?, What is the 
endemic level? As problems were resolved, the simple models were generalised in several 
ways towards making them more realistic. Some such extensions were for example to allow 
for a community where there are different types of individual, allowing for non- uniform 
mixing between individuals (i.e. infectious individuals don't infect all individuals equally 
likely), for example due to social or spatial aspects, and to allow seasonal variations. 
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Another generalisation of the initial simple deterministic epidemic model was to study 
stochastic epidemic models. A stochastic model is of course preferable when studying a 
small community. But, even when considering a large community, which deterministic 
models primarily are aimed for, some additional questions can be raised when considering 
stochastic epidemic models. For example: What is the probability of a major outbreak?, 
and for models describing an endemic situation: How long is the disease likely to persist 
(with or without intervention)? Later stochastic models have also shown to be advan- 
tageous when the contact structure in the community contains small complete graphs; 
households and other local social networks being common examples. Needless to say, 
both deterministic and stochastic epidemic models have their important roles to play 
however, the focus in the present paper is on stochastic epidemic models. 

In the present paper we will study a fairly simple class of stochastic epidemic models in 
a closed community, and present properties of the model. These are both small popula- 
tion properties, and approximations assuming a large community: early stage behaviour 
of the epidemic, final epidemic size distribution and the duration of the epidemic. The 
main large-population approximation results can be summarized as follows. Assuming 
a large population, the early stages of the epidemic can be approximated by a branch- 
ing process, where "giving birth" corresponds to "infecting someone". If the branching 
process/epidemic is super-critical it is possible that a large epidemic outbreak occurs (cor- 
responding to the branching process growing beyond all limits). If this happens, a balance 
equation determines the final number of infected added with some Gaussian fluctuation 
of smaller order. As regards to the duration of a major outbreak the whole outbreak is di- 
vided into three sections: the beginning (up to when a small fraction have been infected), 
the main part (in which nearly all infections take place), and the end (when the last small 
fraction of people get infected), and these parts last for durations of order logn, 1 and 
logn respectively, thus making the total duration of order logn. 

We will also describe how the models can be applied to answer epidemiological questions, 
for example how to estimate important epidemiological parameters from outbreak data 
and how to study effects of interventions such as vaccination. We then describe many 
important extensions of stochastic epidemic models aiming at making them more realistic 
and give some key references. The paper is however not claiming to be a complete reference 
guide to all important contributions in stochastic epidemic models. 

In Section [2] we first define the deterministic general epidemic model and derive some 
properties of it, then describe some cases where a deterministic model is insufficient, and 
end by defining what we call the standard stochastic SIR-epidemic model. In Section [3] we 
present properties of the model, both exact for a small population, and approximations 
relying on a large community. In Section H] we describe how the models can be used 
to answers epidemiological questions, and in Section [5] we describe a number of model 
generalizations and also a model for an endemic infectious disease. 
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2 Stochastic epidemic models — why? 



2.1 Deterministic epidemic models 

One simple model, the deterministic general epidemic model (e.g. Bailey, 1975, Ch. 6.2), 
can be defined by two differential equations. It is assumed that at any time point an 
individual is either susceptible (s), infected and infectious (i) or recovered and immune 
(r). Such individuals are from now one called susceptibles, infectives and recovered re- 
spectively. 

The model makes the following assumptions: only susceptible individuals can get infected 
and, after having been infectious for some time, an individual recovers and becomes 
completely immune for the remainder of the study period. Finally, we assume there are 
no births, deaths, immigration or emigration during the study period; the community is 
said to be closed. A consequence of the assumptions is that individuals can only make 
two moves: from S to I and from I to R. For this reason the model is said to be an SIR- 
epidemic model. Models having no immunity (individuals that recover become susceptible 
immediately) are called SIS-models, models having a latent state when infected, before 
becoming infectious, are often called SEIR ("E" for exposed but not infectious), models 
where immunity wanes after some time are called SIRS-models, and so forth. Models that 
allow for births/deaths/immigration/emigration are referred to as having demography or 
having a dynamic community. The focus in this paper is on SIR models in a closed 
community; see however Section 15.31 for a model allowing births and deaths, and Section 
15.41 for a discussion about latency periods. 

Let s(t), i{t) and r(t), respectively denote the community fractions of susceptibles, infec- 
tives and recovered. Since these are fractions and the community is closed we assume that 
s(t) + i(t) + r(t) = 1 for allt > 0. From the assumptions mentioned above, together with 
the assumption of the community being homogeneous and people mixing homogeneously, 
the deterministic general epidemic model is defined by the following set of differential 
equations: 



These differential equations, together with the starting configuration s(0) = 1 — e, i(0) = e 
and r(0) = defines the model. 

The initial fraction infectives e > is often assumed to be small as indicated by the 
notation e, it must however be positive - otherwise all differential equations are constant 
and equal to 0. The reason for assuming that r(0) =0 is that initially immune individuals 
play no part in the dynamics so, up to a normalizing constant, initially immune individuals 
may simply be ignored. Some authors choose to let s(0) = 1, i(0) = e and r(0) = 0. The 
use of "fraction" is then somewhat misleading, but s(t) has the interpretation of being the 
fraction still susceptible among the initially susceptibles at t. When e is small (as we will 
nearly always assume) there is hardly any difference between the two parametrisations. 

The term Xs(t)i(t) in Equation ([!]) comes from the fact that susceptibles must have 
contact with infectives in order to get infected, so the assumption about uniform mixing 



s'(t) 
i'{t) 
r'{t) 



-\s(t)i(t), 
Xs(t)i(t) - ji(t) 
li(t). 



(1) 
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(mass-action) implies that infections occur at a rate proportional to s(t)i(t). This term 
is non-linear which makes the solution of the system of differential equations non-trivial. 
Finally, it is worth pointing out that since s(t) +i(t) + r(t) = 1 it is actually enough to 
keep track of two of the quantities. 

By studying the differential equations it is straightforward to show that s(t) is monotoni- 
cally decreasing down to s(oo) say, and r(t) is monotonically increasing up to r(oo). The 
differential equation for i(t) can be written as i'(t) = i(t)(Xs(t) — 7). So, if As(0) > 7, 
then i(t) initially increases, but eventually, when s(t) has decreased enough, i(t) starts 
decreasing. If on the other hand As(0) < 7, then i(t) decreases already from the start 
with the effect that little will happen as t tends to infinity (in both cases it can be 
shown that z(oo) = 0). This dichotomy is illustrated in Figure Q] where we have plotted 
(s(t),i(t),r(t)). To the left this is done for the case A = 1.5, and 7 = 1, and to the right 
for A = 0.5 and 7 = 1, both having initial configuration (s(0), i(0), r(0)) = (0.99,0.01,0). 
In the left a substantial fraction (58.3%) eventually get infected, whereas in the right fig- 
ure this fraction is negligible (only an additional 0.9% get infected), we say that a major 
outbreak has occurred in the first case and a minor outbreak occurred in the latter case. 
Since i(Q) is assumed small (and s(0) being close to 1), the critical value separating the 
two very different scenarios is Rq := A/7 = 1. 
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Figure 1: Solution of the differential system defined in (JT]), s(t): ■—, i(t): — , r(t): . 

Both figures have initial configuration s(0) = 0.99, i(0) = 0.01. To the left is the case 
with A = 1.5 and 7 = 1 (so R = 1.5), and to the right is the case A = 0.5 and 7 = 1 (so 
R = 0.5). 



The ratio Rq = A/7 is hence of fundamental importance and can be interpreted as the 
average number of new infections caused by an infectious individual before recovering. 
The ratio is often referred to as the basic reproduction number (a term with its origin 
in demography - the average number of individuals that one individual reproduces) and 
denoted by R : 

Ro = (2) 

7 

When Rq > 1 the epidemic takes off and when Rq < 1 there is no (big) epidemic. The 
differential equations (CD) can also be used to obtain a balance equation for the final state 
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(s(oo), 0, r(oo)). By dividing the first equation by the last we get ds/dr = —RqS, which 
implies that s(t) = s(0)e~ R ° r ^ . The fact that i(oo) = implies that s(oo) = 1 — r(oo); at 
the end of the epidemic there are no infectives, only susceptibles and recovered (immune). 
From this we get a balance equation determining the fraction z = r(oo) that at the end 
of the epidemic were infected: 

l-z = (l-e)e- Roz . (3) 

The balance equation can be interpreted as follows: in order not to have been infected 
(which a fraction 1 — z satisfy) you must belong to those not initially infected (the first 
factor on the right) and you must escape the infection pressure Rq caused by those z 
who were infected. In Figure |5] we have plotted the solution 2 as a function of Rq, when 
starting with a negligible fraction initially infectives. The threshold value of 1 is clearly 
seen to be the value above which a positive fraction gets infected. 




Figure 2: The final size of the epidemic as a function of Rq f° r the deterministic epidemic 
model. The initial fraction of infectives is approximated to equal iq = 0. 



The above results will also be useful when considering a related stochastic epidemic model 
for a large community. 

2.2 When are deterministic models insufficient? 

In the previous subsection we analysed the deterministic general epidemic model and 
showed that: if Rq < 1 there will only be a small outbreak, and if Rq > 1 there will be 
a major outbreak infecting a substantial fraction of the community, and how big fraction 
is determined by Equation ([3]). The results rely on that the community is homogeneous 
and that individuals mix uniformly with each other. 

Even if the assumption of a homogeneous uniformly mixing community are accepted this 
model may not be suitable in some cases. For example, if considering a small community 
like an epidemic outbreak in day care center or school it seems reasonable to assume 



5 



some uncertainty /randomness in the final number infected. Also, even if Rq > 1 and the 
community is large but the outbreak is initiated by only one (or a few) initial infectives 
it should be possible that, by chance, the epidemic never takes off. These two arguments 
motivate the definition of a related stochastic epidemic model. Later we will also show 
two other reasons motivating the use of stochastic epidemic models: it enables parameter 
estimates from disease outbreak data to be equipped with standard errors and, when 
studying epidemic diseases, the question of disease extinction is better suited for stochastic 
models. 

2.3 A simple stochastic SIR epidemic model 

We now define the standard stochastic SIR epidemic model. Just like for the deterministic 
general epidemic model we assume a closed homogeneous uniformly mixing community, 
and let n denote the size of the community. 

Let S(t), I {t) and R(t) respectively denote the number of susceptibles, infectives and re- 
covered at time t, and suppose that at time t = these numbers are given by 5(0) = n—m, 
I(t) = m and -R(O) = 0. The dynamics of the model are defined as follows. Infectious 
individuals have " close contact" with other individuals randomly in time at constant rate 
A, and each such contact is with a randomly selected individual, all contacts of differ- 
ent infectives being defined to be mutually independent. By "close contact" is meant a 
contact close enough to result in infection if the other individual is susceptible, other- 
wise the contact has no effect. Any susceptible that receives such a contact immediately 
becomes infected and infectious and starts spreading the disease according to the same 
rules. Infected individuals remain infectious for a random time / (the infectious period) 
after which they stop being infectious, recover and become immune to the disease. The 
infectious periods are defined to be independent and identically distributed (also inde- 
pendent of the contact processes) having distribution Fj and mean E(I) = 1/7 (to agree 
with the determinist model). 

The epidemic starts at time t = 0. As the epidemic evolves, according to the rules above, 
new individuals (may) get infected and eventually recover, up until the first time T when 
there are no infectives in the community. Then no further individuals can get infected 
implying that the epidemic stops. The final state of the epidemic is described by the 
ultimate number R(T) infected (recall that that I{T) = 0, so S(T) = n — R(T) make 
up the rest of the community). The final number of infected R{T) will consist of those 
m who were initially infected plus those Z, say, who were infected during the outbreak. 
Later we will study the exact and approximate distribution of Z = R(T) — m. 

Two choices of distributions for the infectious periods have (for mathematical reasons) re- 
ceived special attention in the literature. The first is where Fj is exponentially distributed 
with intensity parameter 7, which goes under the name the stochastic general epidemic 
model (e.g. Bailey, 1975, Ch. 6.3). Then the model is Markovian and the Markov process 
(S (t) , I (t) , R(t)) has jump-intensities much related to Equation ([1]) of the deterministic 
general epidemic. The second choice of infectious period is where / is non-random (and 
equal to I/7). This choice is called the continuous-time version of the Reed- Frost model. 
An equivalent (for the final outcome) version is where it is assumed that all infections 
take place exactly at the end of the infectious period, and this model was initially defined 
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by Reed and Frost 1928 in a series of lectures (unpublished). The Reed- Frost model has 
the mathematical tractability that whether or not an individual makes contact with two 
separate individuals are independent events. This in turn implies that the Reed-Frost 
model can be analysed by an Erdos-Renyi graph (e.g. Bollobas, 2001) where each pair of 
individual has an edge between them independently, with probability p = 1 — e~ A//n7 . 



3 Model properties 

We now explain some important properties of the standard stochastic SIR epidemic model. 
In Section 13.11 we derive some exact results; the rest of this section is devoted to approxi- 
mations assuming a large community. 



3.1 Exact distribution 

No matter what infectious period distribution Fi is considered it is not possible to derive 
an exact and simple closed form expressions for the time dynamics of the epidemic. How- 
ever, it is possible to derive a recursive formula for the final size of the epidemic, Equation 
dHl) below, and this formula, which we now explain, is based on the fact that in order not 
get infected an individual must "escape" infection from all those who did get infected 
during the outbreak. This has been done in several ways (e.g. Picard and Lefevre, 1990), 
but our outline follows that of Ball (1986) where more details can be found. 

The derivation of the recursive formula for the final size uses two main ideas: a Wald's 
identity for the final size and the total infection pressure, and the interchangeability of 
individuals making it possible to express the probability of getting i additional infections 
among the initially m — n in terms of getting i infected in a smaller subset. 

We start with the latter result. To this end, fix n and write A' = X/n. Let Z denote 
the final number infected excluding the initial infectives, so the possible values for Z are 

0. . . . , n — m. Since individuals are interchangeable we can label the individuals according 
to the order in which they get infected. The initial infectives are labelled — (m— 1), . . . , 0, 
then according to time of infection: 1, . . . , Z, and those who avoid infection according to 
any order Z + 1, . . . , n — m. With this labelling we define the total infection pressure A 
by 

A = \> £ h (4) 

i=— (m— 1) 

1. e. the infection pressure, exerted on any individual, during the outbreak (sometimes also 
called the "total cost" of the epidemic). 

Now, let p[ n = P(Z(™~ m ) = i) denote the probability that exactly i susceptibles 
get infected during the outbreak, explicitly showing the number of initial susceptibles 
but suppressing the dependence on the initial number of infectives m. Then, using the 
interchangeability of individuals and reasoning in terms of subsets among the initially 
susceptibles, it can be shown (Ball, 1986) that for any i < k < n — m, it holds that 

(n—m) (k) 

Pi E ^(n-m-*)AW| Z (*) =i j. (5) 



Pi 
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the probability that i get infected among the initially m — n initially susceptibles equals 
the product of the probability of having i infected in a smaller subset of size k (k < m — n) 
multiplied by the probability that no one in the remaining set gets infected conditioning 
on the first event. The notation A^ k ' and Z^ k > are hence the total pressure and final size 
starting with k susceptibles (and m infectives). 

We now show a Wald's identity for Z^ and A^ k \ Let <p{6) = E(e~ ei ) denote the Laplace 
transform of the infectious period /. We then have the following Wald's identity (Ball, 
1986): 



E 



The following steps proves the result 



6 > 0. 



(6) 



i\ \m+k 



E 




= E 



E 



exp 



oxi) | -0 A (k) + X' 

HA(k \4>(6\" k - z(k)] 



where the last identity follows because the k — Z^ infectious periods Izw+i, ■ ■ ■ h, are 
mutually independent and also independent of A^ (which only depends on the first Z^ 
infectious periods and the contact processes of these individuals). Dividing both sides by 
{(j){0\')) m+k gives the desired result. 

If we use Wald's identity with 9 = n — m — k and condition on the value of Z^ k > we get 



* E ^ e -(n-m-k)A( k ) \ Z (k) = 



i=0 



(0((n - m - k)X')) r 



(k) 



1. 



(7) 



Using Equation dSJ) in the equation above we get 

k (k\ (n—m) 



i. 



-(«7»)(0((n-m-A;)A')) m +* 

Simplifying the equation, returning to A = X'n and putting ^ on one side, we obtain 
the recursive formula for the final size distribution p^~ m \ k = 0, . . . ,n — m: 
Exact final size distribution: 

fc-i 



j(n— m) 



n — m 

k 



[(j)((n-m-k)X/n)} m+k -J2 



i=0 



n — m 



[<p((n-m-k)\/n)] k l p\ 



k-iAn—m) 



(8) 

Note that this is a recursive formula. For example, solving Equation (jSj) for k = and 
then for k — 1 gives, after some algebra, 



(n—m) 

Po 



(my 



pi 



= n 0(( n - i)A/ n ) ( [0((n - l)X/n)} m - [0(A)]' 
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In order to compute p£ using (jSJ) it is hence necessary to sequentially compute p$ m ' 
up to p^~i ■ As a consequence the formula is not very enlightening and it may be 
numerically very unstable when k (and hence n — m > k) is large. Even when it is 
possible to compute p^ m ^ using with n — m being large, an approximative formula 
can be more informative. For this reason we devote the rest of this section to the case 
where n — m is large. 

3.2 Early stage approximation 

We start by deriving an approximation for the early stages of an outbreak. The approx- 
imation relies on that the initial number n-mof susceptibles is large. The key reason 
for having an approximation during the early stages of an outbreak when there are many 
initial susceptibles is that it is then very unlikely that any of the first number of infectious 
contacts happen to be with the same susceptible individual. Conversely, it is very likely 
that all of the first set of infectious contacts happen with distinct individuals. As a conse- 
quence, the number of individuals that distinct infectives infect (during the early stages) 
are independent and identically distributed random variables. This hence suggests that 
the epidemic process may be approximated by a suitable branching process (e.g. Haccou 
et al., 2005). In this subsection we make the approximation more precise and derive re- 
sults for when a major outbreak (infecting a non-negligible fraction) is possible, and if so, 
also what the probability of a major outbreak is. 

Assume that the initial number of infectives m > 1 is fixed and that n — m is large (later 
we will take limits as n — > oo). One way of constructing the epidemic is as follows. Label 
the n individuals — (m — 1), . . . , n — m, where the first m individuals refer to the initial in- 
fectives and the remaining n — m refer to the initial susceptibles, but otherwise arbitrary - 
note that here the labelling is not according to order of infection. Let J_( m _i), . . . , J n _ m be 
i.i.d. with distribution Fj (the infectious periods), X-(m-i)('), . . . , Xn-m(') being i.i.d. Pois- 
son processes all having constant intensity A (the contact processes), and let U%, . . . , U n _ m 
be i.i.d. uniform random variables on the unit interval (to be used for determining who 
is contacted when a close contact occurs). The epidemic process is defined using these 
random variables as follows. Start at t — 0. The contact processes of the initial infec- 
tives X-(m-i)(')> • • • ;Xq(") are "activated" and the infectious periods of these individuals 
start. Time increases without anything happening until the first time point t\ when ei- 
ther one of the activated contact processes has an "event", or one of the infectious periods 
J_( m _i), . . . , Io stops. If the latter happens the corresponding contact process is deacti- 
vated and that persons infectious period stops and the individual recovers and becomes 
immune. If the former happens the corresponding individual has an infectious contact. 
Which person that is contacted is determined by U\. the person being contacted has 
index \nU\\ — (m — 1), where |_ - J denotes the integer part, implying that each individual i 
(i = — (m — 1), ... ,n — m) is selected with equal probability 1/n. If the contacted person 
is still susceptible he/she gets infected and the corresponding infectious period is started 
and the contact process activated, and if the contacted person has already been infected 
nothing happens. Time then moves on until either an infectious period is terminated or a 
contact among the activated contact processes occur. Depending on what happens an in- 
fectious period is terminated or a randomly selected individual is contacted (and infected 
unless it has already been infected). This goes on until the first (random) time point T 
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when there are no activated contact processes and all initiated infectious periods have 
stopped. Since there are finite number of individuals, all having finite infectious periods 
with probability 1, T will be finite with probability 1. 

It is straightforward to check that this construction gives the desired epidemic model: 
individuals have infectious periods distributed according to Fj and while infectious each 
individual has contacts with randomly selected individuals at rate A. Another nice feature 
with this construction is that it is in fact possible to construct a sequences of epidemics, 
indexed by the initial number of infectives, as well as a homogeneous branching process 
with life-lengths distributed according to Fj and constant birth rate A, using the same set 
of random variables. This can be used to couple the whole sequence of epidemics and the 
limiting branching process and to show that they agree up to some random point. We 
refer to Ball (1983) for a more detailed study about this - here we just give the heuristics. 

The branching process is simply constructed without the uniform numbers, so in the 
branching process a new individual is "born" whenever a contact occurs. The same 
applies to the epidemic (having n — m initially susceptible) with "born" replaced by 
"infected" except when a contact in the epidemic is with an already infected individual. 
As a consequence, the epidemic and the branching process agrees up until the first time 
point when a contact is with an already infected in the epidemic, denoted a ghost contact 
in Mollison (1977). Initially there are n — m susceptible and m infected, so the probability 
that the first contact is not a ghost contact equals (n — m)/n which is close to 1 when 
n is large. Given this, the second contact is also a non-ghost- contact with probability 
(n — m— l)/n, and so on; note the resemblance with the well-known birthday problem. 
From this it follows that the branching process and epidemic agrees at least up until the 
fc'th contact (i.e. no ghost contact has occurred) with probbility 

(n — m)k 

P(no ghost among first k contacts) = ^ , (9) 

where rj := r(r — 1) . . . (r — j + 1). Recalling that m and k are much smaller than n and 
using the well-known approximation 1 — e ~ e~ e , we get the following approximation for 
the probability in (Q: 

P(no ghost among first k contacts) fa e~(^ + "' +m+ « 7 = e - fc (fc-i+™)/2n_ 

For large n this probability is close to 1 whenever k = o(\/n). From this it follows that, 
with large probability, the epidemic and the branching process agrees at least up until 
there has been k contacts, where k is small in relation to y/n. 

The above heuristics (made precise in Ball, 1983) motivate that we can approximate the 
epidemic with the branching process up until k births have occurred. The advantage with 
this approximation is that branching processes are well-studied (e.g. Haccou et al., 2005). 
For instance, our branching process (with life-length distribution / ~ Fj and constant 
birth rate A) has mean offspring distribution \E(I) = A/7, a quantity previously denoted 
by R (= A/7). The branching process is sub-critical if R < 1, critical if R = 1 and 
supercritical if Rq > 1. The total progeny Z^, the number of individuals ever born in 
the branching process, is known to be finite (the branching process goes extinct) with 
probability 1 if Rq < 1. And, if Rq > 1 then has a finite part, and is infinite (grows 
beyond all limits) with a strictly positive probability p that can be determined. In either 
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case, the distribution of the finite part of has a well-defined distribution depending 
on m, A and Fj, cf. Jagers (1975, Ch 2.11). 

Due to the approximation outlined above, the branching process and the epidemic coincide 
on the part of the sample space where the branching process goes extinct (whether sub- 
critical or not), with arbitrary large probability if n is large enough. As a consequence, 
we may approximate P(Z n = j) with P{Z OQ = j) for small and moderate j, a distribution 
which is quite complicated except for some special cases. To compute an expression for 
p = P(Z oa = oo) is however easier. We compute the opposite, i.e. the probability 1 — p 
that the branching process, starting with m individuals, goes extinct. For this to happen 
all m initial lineages must go extinct. So if q denotes the probability that one lineage goes 
extinct it follows that 1 — p = q m . In order to obtain an expression for q we condition 
on the number of births X the initial individual gives birth to before dying. Given that 
X = j all these j offspring must have lineages that go extinct, an event that happens 
with probability qK It follows that q must satisfy q = E(q x ). The number X that an 
individual gives birth to during the life-length / ~ Fj of course depends on /. Given 
this duration: I = y, the number of births is Poisson distributed with mean parameter 
Xy since contacts occur according to a Poisson process having rate A. So, if we denote 
the Laplace transform of the infectious period / by 0(s) = E(e~ sI ), then the following 
relation must hence hold for q: 

q = E{q x ) = E{E{q x \I)) = E{e~ XI ^) = 0(A(1 - q)\ (10) 

where the third equality uses the Laplace transform of a Poisson variable. It is in fact 
known (e.g. Jagers, 1975, p 9) that q is the smallest solution to this equation. 

Early stage approximation. To summarize, when n is large, the initial phase of the 
epidemic can be approximated by a homogeneous branching process with birth rate A 
and life-distribution J ~ Fi having Laplace transform 4>{s). If Rq := \E(I) = A/7 < 1 
the final size of the epidemic is bounded in probability, whereas if Rq > 1 it is not. The 
approximation further tells us that when Rq > 1 the epidemic will be " minor" (bounded) 
with a probability q m and "major" (unbounded) with probability p = 1 — q m , where q is 
the smallest solution to (fIU|) . The distribution of outbreak sizes in the "minor" case can 
also be determined from branching process theory. 

We end this section with an example illustrating our results. In the section to follow we 
return to the situation where the branching process grows beyond all limits, corresponding 
to a major outbreak in the epidemic. 

Example. Suppose the epidemic is initiated with m — 1 individual and that there are 
n — 1 — 999 initial susceptibles. Further we treat the case where the infectious period 
is Fj = Expl'j), the exponential distribution with rate parameter 7 (i.e. the general 
stochastic epidemic). When / ~ Exp(7) then 0(s) = E(e~ sI ) = 7/(7 + s). This implies 
that q is the smallest solution to q = 7/(7 + A(l — q)). Solving this quadratic equation 
shows that q = 7 / 'A = 1/Ro if Rq > 1 and otherwise q = 1. This means that there 
will be a minor epidemic outbreak with probability 1/Ro, and a major outbreak with the 
remaining probability 1 — l/R®. Further, it can be shown that for this particular infectious 
period/life-length distribution (computations omitted) the outbreak probabilities are well 
approximated by the corresponding total progeny distribution 
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Note that for fixed A and 7 (and hence R = A/7) both q and P(Z n = j) can be 
computed numerically. However, the approximation for P(Z n = j) relies on that not too 
many individuals have been infected - then the branching process approximation breaks 
down - so it should only be used for j up to around 20 (y/n 31.6). 

3.3 Final size approximation 

In the previous subsection it was shown that when n is large the epidemic is well- 
approximated by a branching process up until approximately ^Jn individuals have been 
born. If the branching process goes extinct this will never happen when n is large enough, 
but when the branching process grows beyond all limits (hence implicitly assuming that 
Ro > 1) the approximation breaks down. The question is of course what happens with 
the epidemic in this case; something which we now briefly outline. The outline is based 
on results in Scalia Tomba (1985, 1990) which uses the elegant Sellke construction (Sellke, 
1983). 

When we are only interested in the final number infected we may introduce a different time 
scale as follows. Each initially susceptible individual is given a so-called resistance having 
Exp(l) distribution. In the first time step we let each initially infective % "throw out" its 
infection pressure AJj (defined as the contact rate multiplied by the length of the infectious 
period) uniformly in the community (so Xli/n on each individual). Those individuals with 
resistances smaller than this infection pressure then become infected and in turn throw 
out their infection pressure uniformly, thus increasing the accumulated infection pressure. 
This procedure goes on until the first time step when there are no new infections; then 
the epidemic stops. It can be shown that this gives the correct final size distribution - the 
only difference with this representation lies in the order and time at which individuals get 
infected. The advantage with the construction (the Sellke construction) is that, at each 
time step we add a random number of i.i.d. random variables (the infection pressures), 
and this is done until the first time point the accumulated sum no longer exceeds another 
sum of random variables (the resistances). Scalia Tomba (1985, 1990) uses this, together 
with an embedding argument, to show that the final size distribution coincides with that 
of the crossing time of a stochastic process with a straight line. Because the random 
process is made up of i.i.d. contributions it obeys a law of large number and central limit 
theorem. 

The above description is perhaps not very enlightening, but to show the complete result 
is quite technical. Something which is easier to explain is an argument for the expression 
of the limiting final fraction infected z in case there is a major outbreak. We do this by 
deriving a balance equation for z. Neglecting the difference between n and n — m (remem- 
ber that n is large and m fixed), we have that nz is the final number infected. Further, 
the probability of escaping infection from one infected individual i (with infectious period 
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Jj) equals E(e XIi / n ), We then have the following set of approximations 

1 — z — fraction not getting infected 

~ probability of not getting infected 

~ probability of escaping infection from all nz getting infected 

w E(e- Xh,n ) E(e~ XInz/n ) 

= E ^e~™ ^i=i Ti \ 

^ e -XzE(I) _ e -Roz 

where the last approximation is simply the law of large numbers. The limiting fraction 
infected in case there is a major outbreak should hence be a solution to the equation 

l-z = e - Bo *. (11) 

Note that this is the same equation as for the deterministic model ([3]), except that we now 
assume a negligible fraction initially infected (i.e. e = 0). It is not hard to show that this 
equation always has the solution z = (corresponding to a minor outbreak) and, when 
Rq > 1, there is another unique solution z* between and 1 (corresponding to a major 
outbreak). In Figure [2] of Section 1231 the largest solution z* is plotted as a function of 
R . 

The above heuristics indicates that the final fraction infected Z/n will lie close to either 
or else, if Rq > 1, close to z*. This can in fact be shown rigorously. Moreover, a central 
limit theorem can be shown for the case where there is a major outbreak. The following 
theorem summarizes the result for both a minor and a major outbreak (see Andersson 
and Britton, 2000a, Ch 4, and references therein, e.g. von Bahr and Martin-L6f, 1980). 

Threshold theorem for the final size of the epidemic: Consider the standard 
epidemic model with m (fixed) initial infectives and n — m initial susceptibles. If Rq < 1, 
then Z n := Z n /n — > in probability as n —>■ oo. On the other hand, if Rq > 1, then 
Z n — > ( in distribution, where ( is a two point distribution defined by P(( = 0) = q m and 
P(( = z*) = 1 — q m , where q was defined in the previous subsection and z* is the unique 
strictly positive solution to (jTTl) . Finally, on the part of the sample space where Z n — > z* 
we have that 

f- (9 *x M f a z*{l-z*){l + r\l-z*)Rl) 

where the second parameter in the normal distribution denotes the variance, in which 
r 2 = V(I)/(E(I)) 2 is the squared coefficient of variation of the infectious period. 

To illustrate our results we have simulated an epidemic in a community of n = 1000 
individuals, starting with m = 1 initially infective, and having A = 1.5 and / ~ Exp(l). 
Before looking at the simulations we compute the theoretical values. For our model and 
parameter choices we have Rq = \E(I) = 1.5 and r 2 = V(I)/(E(I)) 2 = 1/1 = 1. Using 
results from the previous subsection we conclude that the probability of a minor outbreak 
equals q 1 = 1/Rq ~ 0.667, and using ffTTl) we find that z* ~ 0.583. The standard deviation 
of Z n equals y/ny/z*(l - z*) (1 + r 2 (l - z*)R )/(l - (1 - z*)R ) w 53.1. 
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To see how our limiting results apply to this particular finite case we have simulated 
10 000 such epidemics. In Figure [3] we show the resulting histogram for the final outbreak 
sizes. The conclusion is that the limiting results apply quite well with the simulations. 
First, the proportion of simulations resulting in minor outbreaks (as seen from the figure 
there is a clear distinction between minor and major outbreaks) equals 0.678, which is to 
be compared with the theoretical value of 2/3 ~ 0.667. Further, by simple examination 
it looks as if the remaining part (the major outbreaks) have a normal distribution with 
a mean close to nz* ~ 583 as suggested by the limiting result. Finally, the standard 
deviation is of course harder to "see", but nearly all observations seem to lie within 
100 from the mean, which agrees well with the theoretical value of 53 for the standard 
deviation. 
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Figure 3: Empirical distribution of final size from 10 000 simulations of the general 
stochastic epidemic with Rq = 1.5 in a community with 1000 individuals and one initial 
infective. 



3.4 Duration of epidemic 

In the previous sections we have studied the questions of main interest: can an outbreak 
occur, and if so, with what probability and how large will these major outbreaks be. 
Another questions of interest could be to understand how long it will take for the epidemic 
to peak and eventually to die out. Below we sketch some results in his direction. For 
details we refer to Barbour (1975). 

If there is only a minor outbreak (which happens with certainty if Ro < 1, but also with 
positive probability q m if Rq > 1), then the disease will disappear after a short time. We 
hence focus on the case where we have a major outbreak (hence assuming Ro > 1) and 
study how the time to extinction T = T n depends on the population size n. 

It was seen that the initial stages of the epidemic could be approximated by a supercritical 
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branching process up until approximately sjn individuals have been infected. Since a 
branching process has exponential growth, this will take a time of order logn. Once a 
large number of individuals have been infected, the epidemic process may be approximated 
by the deterministic counter part defined in ([1]). The time for this process, starting in 
an arbitrary small fraction initially infectives zq to first grow and the decrease down to 
some small value i(t) = e has a duration which does not grow with n. The last part 
of the epidemic is where close to a fraction z* have already been infected. The epidemic 
then behaves like a branching process where individual infect on average Rqz* individuals. 
It can be shown that this number is smaller than 1 when Rq > 1 as we have assumed. 
It follows that the end of the epidemic can be approximated by a sub critical branching 
process starting with ne infectives. The duration for such a process to go extinct is also 
of order logn. 

To summarize, we have made it plausible that the duration of the epidemic T n , assuming 
a major outbreak, has a distribution of the form 

T n = ci log n + c 2 + c 3 log n + Z, (13) 

where c\, C2, C3 are constants and Z is a random variable, all depending on the model 
parameters. To show this result is quite technical, see Barbour (1975). 

4 Applications 

There are many applications to modelling of infectious disease spread. Some diseases that 
have received much modelling attention over the last decades are for example HIV, Small- 
pox (the threat for terrorist attacks), foot and mouth disease, SARS and most recently 
the new (H1N1) influenza. Below we describe some specific methodological questions that 
are often addressed. 

4.1 Vaccination and other interventions 

One reason for modelling infectious disease spread is to understand how an outbreak can 
be prevented. This can be achieved in several different ways. When it comes to new 
emerging severe infections like SARS, drastic measures like isolation, closing of schools 
and travel restrictions are often put into place. All these measures aim at reducing 
contact rates, i.e. to reduce R . The effect of a specific preventive measure depends on 
the particular disease and also on the community under consideration. 

A somewhat different preventive measure is vaccination. This does not change Ro, but 
instead it reduces the pool of susceptibles by making individuals immune. We now describe 
how to model this and study its effect on the outbreak. 

Suppose a vaccine is available prior to the arrival of the disease, and assume a fraction v 
are vaccinated. Suppose further that the vaccine is perfect in the sense that all vaccinated 
individuals get completely immune (see Section 15.41 for extensions) . 

This will have one effect on the model: the number of initially susceptibles is reduced 
from n-mto n(l — v) — m =: n' — m. Other than that the model is the same. However, 
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the contact rate with a given individual still equals X/n, so if we want to use n' instead we 
get X/n = A(l — v)/n' =: X'/n'. We can hence compute the exact final size distribution 
using results in Section [37TI for example Equation (jSJ) with n' = n(l —v) instead of n and 
A' = \(l — v) instead of A. Similarly, the early stages of the epidemic can be approximated 
by a branching process with birth rate A' and mean life- length I/7. It follows that the 
reproduction number after a fraction v has been vaccinated, denoted R v , satisfies 



From the results of Section 13.21 we hence conclude that if Rv < 1 there will be no major 
outbreak, whereas if R v > 1 there will be a major outbreak with probability 1 — {q') m , 
where q' is the smallest solution to Equation (fit)]) , but with A' = A(l — v) instead of A. 

From Section [331 we also conclude that if R v > 1, the major outbreak will be approxi- 
mately n'z'*, where z'* is the unique positive solution to Equation (flTT) with R v replacing 
Rq. The central limit theorem also applies, so in case there is a major outbreak the total 
number of infected is normally distributed as stated in ffl2l . but wit nf and A' and R v 
instead of n, X and Ro. 

The most important of these results from an applied point of view is (j!4p . i.e. that 
the reproduction number after having vaccinated a fraction v in the community satisfies 
R v = Rq(1 — v), and the fact that a major outbreak is impossible if R v < 1. In terms of 
v this is equivalent to v > 1 — 1/Rq. The critical vaccination coverage, denoted Vc and 
defined as the fraction necessary to vaccinate in order to surely prevent a major outbreak, 
hence satisfies 



For the numerical example given above, with R = 1.5 it hence follows that Vc = 1 — 
1/1.5 ~ 0.33. This means that it is enough to vaccinate 33% of the community to prevent 
a major outbreak. By vaccinating only 33% the whole community is hence protected, a 
state denoted herd immunity. 

4.2 Estimation 

So far we have been interested in (stochastic) modelling, i.e. given a model and its param- 
eters we have studied properties of the epidemic. It was shown that the most important 
parameter is the basic reproduction number Rq, defined as the average number of in- 
dividuals a (typical) infective infects during the early stages of the outbreak, where we 
by "early stages" mean that few individuals have been infected so the vast majority of 
the community is still susceptible. When aiming at preventing an outbreak, another 
important quantity is the critical vaccination coverage vq defined in (1151) . 

In a particular situation it is hence important to know what these parameters are. We 
now describe how to estimate Ro and Vc from observing one epidemic outbreak. One 
advantage of stochastic modelling is that it not only enables point estimates, but also 
standard errors as is now illustrated. 

Suppose a major epidemic outbreak occurs in a community of n individuals, n — m being 
susceptible and a small number m being externally infected. Let z n denote the number 
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of individuals that were infected during the outbreak (excluding the m index cases), and 
let z n = z n /n denote the corresponding fraction. We want to estimate Ro (and vc) based 
on this information. From ({TTI) we know that Rq satisfies 

-log(l-z') 
— ; • 



We also know that, in case of a major outbreak, Z n is asymptotically normally distributed 
around z*, with standard deviation 

y/z*(l - z*) (1 + r 2 (l - z*)R ) 
v^(l - (1 - z*)R Q ) ' 

From this it follows that a consistent and asymptotically normally distributed estimator 
of Rq is given by 

R = - l0g(1 -^ . (16) 

Zn 

The standard deviation of the estimator can be obtained using the delta method (e.g. 
Cox, 1998). Let f(z) = -log(l - z)/z, then f'(z*) = l/(z*(l - z*)) - R /z*. It follows 
that the asymptotic variance of Ro hence equals 



V(Ro) « (f'(z*)) 2 V(Z n ) « 

1 +r 2 (l - z*)Rl 
nz*(l — z*) 



1 Ro\ z*(l-z*)(l + r 2 (l-z*)Rl) 



z*(l — z*) z* J n(l - (1 - z*)Rq 



\2 



A standard error for R is obtained by taking the square root of this and replacing R by 

R and z* by z n : 



,,m^i l+r ^-i^ . (it) 

nz n (l-z n ) 

The standard error above still contains one unknown quantity: r 2 = V(I)/(E(I)) 2 , the 
squared coefficient of variation of the infectious period. From final size data it is impossible 
to infer anything about the distribution of the infectious period. The only way to proceed, 
unless some prior information is available, is to replace it by some conservative upper 
bound, for example r 2 = 1. 

Estimation of the critical vaccination vc is also straightforward. Since vq is defined by 
(fl5l) the natural estimator for vc is 

v =i-i- = i- 1 f: -v as) 

#0 -log(l-<) 

Just like Ro the estimator v c is consistent and asymptotically normally distributed around 
the true value Vq- The asymptotic variance can also be obtained using the delta method. 
We have g(x) = 1 — 1/x, which hence satisfies g'(Ro) = 1/Rq- The asymptotic variance 
of vc then equals V(R )/Rq. A standard error for %c is given by 




s.e.{v c ) = \ Wr7 (19) 
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As a numerical example we assume that z n = 583 individuals in a community of n — 1000 
were infected during an outbreak. Using (fTEl) and (ITT)) we get i?o = 1-50 with standard 
error s.e.(Ro) = 0.09 using the conservative upper bound r 2 = 1. As for the critical 
vaccination coverage the corresponding estimate and standard error are given by i)c = 
0.333 and s.e.(vc) = 0.04. Both estimators are asymptotically normally distributed. 

In the inference procedure presented above we only made use of the final number infected. 
If the epidemic process (S(t), I(t), R(t)) is observed continuously we have more detailed 
information and we should hence be able to make more precise inference. This is true 
although the gain in precision is moderate. We omit this type of inference (involving 
martingales using counting process theory) and refer to Andersson and Britton (2000a, 
Ch 9). 

5 Extensions 

The standard stochastic epidemic model of Section [2731 was defined as a stochastic version 
of the deterministic general epidemic model of Section 12.11 The population is finite, and 
infectious individuals make contacts randomly in time according to a Poisson process 
with intensity A, each time the person being contacted is chosen randomly, and the length 
of the infectious period J is a random variable with distribution /. Despite these more 
realistic model features, the model still contains several simplifying assumptions. In the 
present section we touch upon a number of extensions that have been analysed in the 
literature. Most of these extensions do not alter the qualitative behaviour of the spread 
of infection, but indeed quantitative properties. 

5.1 Individual heterogeneity: multitype models 

One assumption in the standard stochastic epidemic model is that all individuals are 
similar, except the possibility that they have different length of the infectious period. In 
reality most populations are heterogeneous, for example with respect to susceptibility, the 
degree of social activity and/or how infectious they become if infected. The heterogeneities 
may be unknown to some extent, but quite often it is possible to group individuals into 
different types of individual, where individuals of the same type have (nearly) identical 
behaviour. This separation into different types might for example be different age groups, 
gender, previous experience to the disease (giving partial immunity) etcetera. 

In such situations it is common to define a multitype epidemic model as follows. Suppose 
there are k types of individuals, labelled 1, . . . , k, and that the community fractions of the 
different types equal n 1 , . . . ,n k . Type i individuals have i.i.d. infectious periods Jj with 
distribution Fi having mean l/7j. During the infectious period, an z-individual has close 
contacts with a given j-individual at rate Xij/n, i, j = 1, . . . , n. 

If the population, i.e. n, is large, the epidemic may be approximated by a multitype 
branching process. The mean offspring matrix has elements (my), where = XijTTj/ji- 
The basic reproduction number Rq is the largest eigenvalue to the mean offspring matrix 
(rriij). The case where rriij = ai/3j7Tj, referred to as proportionate, or separable mixing, cf. 
Diekmann and Heesterbeek (2000, Ch 5.3), has received special attention for two reasons. 
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First, this implies that ctj can be interpreted as the (average) infectivity of i- individuals 
and (3j as the (average) susceptibility of j-individuals. Secondly the basic reproduction 
then has the simpler form Rq = £V a^^Ki. 

As for the homogeneous case only minor outbreaks are possible if Rq < 1 whereas a large 
outbreak may occur if Rq > 1. There is a related threshold limit theorem stating what 
the probability of a major outbreak is, and a central limit theorem for the final number 
of infected of the different types in case the epidemic takes off. The expressions are more 
involved as are the proofs giving the desired results, why we refer to Ball and Clancy 
(1993) for details. 

5.2 Mixing heterogeneity: household and network models 

Perhaps even more important than allowing individuals to be different in terms of sus- 
ceptibility and infectivity is to allow for non-uniform mixing, meaning that an individual 
has different (average) contacts rates with different individuals. The multitype epidemic 
model of the previous section may include some such non-uniform mixing in the sense 
that contacts with different types of individuals may be different. However, an assump- 
tion of the contact rates between specific pairs of individuals in the multitype model is 
that they are all of order \jn. In many real-life situations individuals tend to have a few 
other individuals with which they mix at a much higher rate. Epidemic models taking 
such type of mixing behaviour into account are often referred to as two-level or multilevel 
mixing epidemic models. The two most common examples are household epidemics and 
network epidemics. 

In a household epidemic individuals are grouped into small groups (households) and it is 
assumed that the contact rate between pairs of individuals of the same household equals 
\h and contact rates between pairs of individuals of different households equals \c/n 
(and all individuals have i.i.d. infectious periods I with mean I/7). Large population 
properties for this model was first rigorously analysed by Ball et al. (1997). There they 
show that it is necessary take into account the random distribution of the total number 
of infected in a household outbreak. During the initial phase of an outbreak, external 
infections are nearly always with not yet infected households, so by treating households as 
" super-individuals" the initial phase of the epidemic may be approximated by a multitype 
branching process, where the different types refer to household size and also the size of 
the household outbreak. The basic reproduction number (now having a more complicated 
definition but still having the value of 1 as its threshold) equals Rq = (X/^)/ih, where 
/jl h is the mean size of a household outbreak where the size of the selected household 
has a size biased distribution due to the fact that larger households are more likely to be 
externally infected as more individuals reside in them. 

As before, major outbreaks are only possible if Rq > 1, and when this happens Ball et al. 
(1997) show a law of large numbers for the number of households of different sizes and 
outbreak sizes using a balance equation. With some additional effort they also derive a 
central limit theorem for the final size in case of a major outbreak. 

Another type of epidemic model having a local structure with much higher (or all) con- 
tact rates between "neighbours" are called network epidemic models (e.g. Newman, 2003). 
The network, an undirected graph, specifies the social structure in the community upon 
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which an epidemic spreads. Quite often the network is assumed to be random but having 
some pre-specified properties. These could for example be the degree-distribution (the 
distribution of the number of neighbours), the clustering coefficient (specifying how fre- 
quent close cycles are present), and the degree correlation of neighbours. Given such a 
network and a stochastic epidemic model "on" the network, it is of interest to see if a 
major outbreak can occur, and if so what is its probability and what is the size of such an 
outbreak. There are still many open problems to be solved in this area (in particular when 
considering a dynamic network) but see for example Britton et al. (2007) for some results. 
In general, the social structure in the community are more influential for diseases which 
are not highly infectious. One such class of diseases are sexually transmitted infections - 
in this case an edge between two individuals in the network correspond a sexual relation. 

5.3 Models for endemicity 

The focus of the present paper has been SIR epidemic models for a closed population, i.e. 
not allowing deaths or that new individuals enter the community during the outbreak. 
This is of course an approximation of real life but, if focus is on short time behaviour it 
is sensible to make such an approximation. Some infectious diseases are endemic in many 
countries, and a question of interest (e.g. Anderson and May, 1991) for such disease are 
to understand their dynamics; to understand why certain diseases are endemic and others 
not, and why a given disease may be endemic in one country but not in another, and it 
is of course also of interest to determine which preventive measure that has the potential 
of eradicating the disease. We now give a brief outline to this problem area and refer the 
reader the literature for more details; e.g. Anderson and May (1991, pp 128) and Nasell 
(1999), and, when studying measles in particular, e.g. Conlan and Grenfell (2007). We 
now define the Markovian SIR epidemic model with demography (Nasell, 1999). 

The population dynamics are very simple: new (susceptible) individuals enter the popu- 
lation according to the time points of a homogeneous Poisson process at rate nfi, and each 
individuals lives for an exponentially distributed time with mean 1/ fi. In words, there is 
a steady and constant influx of susceptibles at rate fin and each individuals dies at rate 
fi, so the population size will fluctuate around the equilibrium state n, which hence can 
be interpreted as "population size". 

The disease dynamics are just like for the Markovian version of the standard stochastic 
epidemic model: an infectious individual has close contacts with each other individual at 
rate A/n: if such a contact is with a susceptible individual this individual gets infected 
and infectious immediately, and each infectious individual recovers at rate 7 and becomes 
immune for the rest of its life. Each individual dies at rate \x irrespective of the infection 
status. The epidemic process {S(t), I(t), R(t);t > 0} is initiated by (S(0), 1(0), R(0)) = 
(so,io,r ), where, as before, we assume that io > 0. In Figure 0] the various jump rates of 
the model are given in the (s, z)-plane. As before it is enough to keep track the number 
of susceptibles and infectives, since recovered and immune individuals play no further roll 
in the disease dynamics. A question of interest is to study properties of this model when 
n is large; we do this by first studying the corresponding deterministic system defined by 
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Figure 4: Jump rates for the SIR epidemic model with demography. All jumps are one 
unit of length except the jump to the north west in which s decreases and % increases by 
one unit (due to someone getting infected). 



the differential equations: 

s'(t) = fi - Xs(t)i(t) - fis(t), 

i'{t) = Xs(t)i(t) - 7i(t) - fii(t), (20) 
r'(t) = 7i(t) -fir(t). 

By equating all derivatives in (120]) to we get the equilibrium state(s). It is straightfor- 
ward to show that (s(t),i(t),r(t)) = (1,0,0), the disease free state, is always a point of 
equilibrium. The basic reproduction number Rq, defined as the expected number of indi- 
viduals a typical individual infects during the early stages of an outbreak, equals A/(7 + /i) 
since now there are two possible reasons for leaving the infectious state: recovering and 
dying. Additional to the disease free equilibrium (which is stable if -Ro < 1 and unstable 
otherwise) there is another stable equilibrium whenever Rq > 1: 

(s(t),i(t),r(t)) = (s, i, f) = (-1 *±Z±, 1 - 1/6 i R °~ 1 ), (21) 

R Rq/6 Rq/O 

where 8 — fi/(j+^i) is the (small) average fraction of a life that an individual is infectious. 
This state is called the endemic level. 

For the stochastic counterpart it is possible to reach the state I(t) = from any other given 
state in finite time with positive probability. This observation together with the fact that 
the state I(t) = is absorbing - once the disease disappears it can never return - makes 
all other states transient. As a consequence there is only one stationary equilibrium: 
the disease free state in which all individuals (fluctuating around the number n) are 
susceptible. However, when R > 1 and n is large there is a drift of (S(t), I(t), R(t)) 
towards the endemic level (ns,ni,nr). So, even though the disease eventually will go 
extinct, this may take a long time, and in the mean time the epidemic process will 
fluctuate around the endemic level. This type of behaviour is known as quasi- stationarity 
(e.g. van Doom, 1991). Questions of interest are for example, given the population size 
n and the parameters A, 7 and \i (or equivalently /1, Rq and S) how long will it take for 
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the disease to go extinct? Or, phrased in another way, how big must the population be 
for a given disease to persist in the community, and which features of the disease are 
most influential in determining this so called critical community size! Another important 
question is to study effects of introducing vaccination in the model. We refer to for 
example Nasell (1999) and Andersson and Britton (2000b) for stochastic methodological 
work in the area of endemic diseases. 

5.4 Other extensions 

We have focused on presenting results for a fairly simple stochastic epidemic model; the 
reason being that even in simple models results are far from trivial. Reality is of course 
much more complicated. There are many features that affect the spread of infections that 
have not been discussed in the paper - below we list some such features, but the are also 
many others. 

It is well-known that seasonal effects play an important roll in disease dynamics. One 
reason is that certain viruses or bacteria reproduce at higher speed under certain seasons, 
e.g. influenza virus prefers cold weather, but perhaps even more important is the change 
in social behaviour over the year. The classical example is school semester and the school 
start in September being the event that triggered measles outbreaks in for example Eng- 
land prior to vaccination (e.g. Anderson and May, 1991, Ch 6). One of the most recent 
illustrations of effects of schools is by Cauchemez et al. (2008), where the influence of 
transmission within school is estimated by comparing over-all transmission during the 
school semester with over-all transmission during holidays. 

The model studied in the present paper lacks a spatial component. Even though travel has 
increased dramatically the last century there is of course still a spatial component in the 
spreading of infections, both between and within countries. In larger modelling/ simulation 
studies of specific diseases, like the new influenza pandemic, space is always taken into 
account (e.g. Yang et al., 2009, and Fraser et al., 2009). This is done by assuming that the 
probability to infect a given individual decays with the distance between the individuals. 
The main effect of introducing space into the model is that the epidemic growth is slower: 
more linear than exponential. 

In the model it was assumed that the infection rate A was constant during the infectious 
period. This rate can be thought of as the product of two quantities: the rate of having 
contact with other individuals and the probability of transmission upon such a contact. 
In reality both of these quantities typically vary over the infectious period. First there 
is often a latency period when an infected individual hardly is infectious, then there is 
often a period where the individual is highly infectious but still have few or no symptoms. 
Eventually the individual has symptoms thus reducing the social activity, and finally the 
infectiousness starts dropping down to 0. This behaviour will clearly affect the dynamics 
of the disease (a disease having a long latency period of course slows down the epidemic 
growth), but it has been shown (Ball, 1983 and 1986) that final size results of the present 
model still applies. The total infectivity of an infectious period in the present model 
equals the random quantity XI. For a model in which the infection rate t time units after 
infection equals C(t)i(t), where C(t) and i{t) are stochastic processes describing social 
activity and infectiousness, then the new model may be analysed with the present model 
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if the distribution of XI is replaced by the distribution of J °° C(t)i(t)dt. In particular, a 
latency period of arbitrary length has no effect on the distribution of the final size. 

In the section about modelling vaccination it was assumed that the vaccine was perfect 
in the sense that it gave complete immunity to the disease. In reality this is rarely the 
case. There are several more realistic models for vaccine response in which the vaccine 
reduces susceptibility symptoms and/or infectivity in case of being infected, where all 
these reductions are random (e.g. Halloran et al., 2003, and Becker et al., 2006). For 
example, a vaccine which reduces susceptibility by a factor e, so the average relative 
susceptibility is (1 — e) as compared with unvaccinated, but has no effect on infectivity 
in case an infection occurs, has a new (higher) critical vaccination coverage 



This true if all individuals have the same reduction e (so-called leaky vaccines) as well as 
if a fraction e of the vaccinated become completely immune and the rest are unaffected by 
the vaccine (all-or-nothing vaccines), or something in-between. Having models for vaccine 
efficacy is of course not enough for making conclusions in specific situations - it is equally 
important to estimate the various vaccine efficacies. This is most often done in clinical 
trials, and it is usually harder to estimate reduction in infectivity because this has to be 
done indirectly since actual infections are rarely observed, see for example Becker et al. 
(2006) and the forth coming book by Halloran et al. (2010). 

Discussion 

Even when trying to include as many realistic features in a model as possible there is a 
limit to how close a model can get to reality, and models can never completely predict 
what will happen in a given situation. It is for example nearly impossible to predict 
how people will adapt and change behaviour as a disease starts spreading. Having said 
this, models can still be (and are!) very useful as guidance for health professionals when 
deciding about preventive measures aiming at reducing the spread of a disease. 

Stochastic epidemic models, or minor modifications of them, can be used also in other 
areas. A classic example is models for the spread of rumours or knowledge (e.g. Daley 
and Gani, 1999, Ch 5). More recently, the world wide web have several aspects resem- 
bling epidemic models: for example computer viruses (which even use terminology from 
infectious diseases) and spread of information. 

The present paper only gives a short introduction to this rather big research field. There 
are several monographs about mathematical models for infectious disease spread: Ander- 
son and May (1991) is probably the most well-known, Diekmann and Heesterbeek (2000) 
has a higher mathematical level, and Keeling and Rohani (2008) also explicitly considers 
disease spread among animals. When it comes to stochastic epidemic models there are 
for example the classic book by Bailey (1975), and Andersson and Britton (2000a) who 
also cover inference; a topic which Becker (1989) focuses on. 
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